{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "gpuType": "T4"
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU",
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "86dca4d17526449d931390a7dc4c468c": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "AudioModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "AudioModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "AudioView",
            "autoplay": false,
            "controls": true,
            "format": "mp3",
            "layout": "IPY_MODEL_38fa5d0d6708431395d9c7f5c5e211f5",
            "loop": false
          }
        },
        "38fa5d0d6708431395d9c7f5c5e211f5": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        }
      }
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Learn OpenAI Whisper - Chapter 6\n",
        "## Using Whisper and YouTube for transcription and translation\n",
        "This notebook provides a simple template for using OpenAI's Whisper and YouTube for audio transcription in Google Colab.\n",
        "\n",
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1NhpG_iZSRxbENy8yXm5exiXyHcBIG34A)\n",
        "\n",
        "## Install Whisper\n",
        "Run the cell below to install Whisper.\n",
        "\n",
        "The Python libraries `openai`, `cohere`, and `tiktoken` are also installed because of dependencies for the `llmx` library. That is because `llmx` relies on them to function correctly. Each of these libraries provides specific functionalities that `llmx` uses.\n",
        "\n",
        "1. `openai`: This is the official Python library for the OpenAI API. It provides convenient access to the OpenAI REST API from any Python 3.7+ application. The library includes type definitions for all request parameters and response fields, and offers both synchronous and asynchronous clients powered by `httpx`.\n",
        "\n",
        "2. `cohere`: The Cohere platform builds natural language processing and generation into your product with a few lines of code. It can solve a broad spectrum of natural language use cases, including classification, semantic search, paraphrasing, summarization, and content generation.\n",
        "\n",
        "3. `tiktoken`: This is a fast Byte Pair Encoding (BPE) tokenizer for use with OpenAI's models. It's used to tokenize text into subwords, a necessary step before feeding text into many modern language models."
      ],
      "metadata": {
        "id": "w2wt2RJygcRR"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "fk9Jtws0xqLw"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "!pip install -q cohere openai tiktoken\n",
        "!pip install -q \"git+https://github.com/openai/whisper.git\"\n",
        "!pip install -q \"git+https://github.com/garywu007/pytube.git\""
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import re\n",
        "from pytube import YouTube\n",
        "\n",
        "video_url = \"https://youtu.be/5bs9XoTac88\" #@param {type:\"string\"}\n",
        "# episode_date = \"20231220-\" #@param {type:\"string\"}\n",
        "drive_folder = \"\" #@param {type:\"string\"}\n",
        "\n",
        "yt = YouTube(video_url)\n",
        "episode_date = yt.publish_date.strftime('%Y%m%d-')\n",
        "source_audio = drive_folder + episode_date + (re.sub('[^A-Za-z0-9 ]+', '', yt.title).replace(' ', '_')) + \".mp4\"\n",
        "\n",
        "audio_file = YouTube(video_url).streams.filter(only_audio=True).first().download(filename=source_audio)\n",
        "print(f\"Downloaded '{source_audio}\")"
      ],
      "metadata": {
        "id": "83zLdk5qzrM1",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "18d7f8a9-b9c3-48ef-faa4-73e4ac7652e1"
      },
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Downloaded '20140328-1993_Procesador_Intel_i486_DX2_Anuncio_Asegrese_de_que_su_prximo_PC_lo_lleva_dentro_En_Espaol.mp4\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import ipywidgets as widgets\n",
        "widgets.Audio.from_file(audio_file, autoplay=False, loop=False)"
      ],
      "metadata": {
        "id": "SA1LrZaHbwKs",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 79,
          "referenced_widgets": [
            "86dca4d17526449d931390a7dc4c468c",
            "38fa5d0d6708431395d9c7f5c5e211f5"
          ]
        },
        "outputId": "0d258bc6-14c3-42b4-eed6-8941493c6eb4"
      },
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Audio(value=b'\\x00\\x00\\x00\\x18ftypdash\\x00\\x00\\x00\\x00iso6mp41\\x00\\x00\\x02imoov\\x00\\x00\\x00lmvhd\\x00\\x00\\x00\\x…"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "86dca4d17526449d931390a7dc4c468c"
            }
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import whisper\n",
        "import torch\n",
        "\n",
        "model = whisper.load_model(\"small\")\n",
        "\n",
        "audio = whisper.load_audio(audio_file)\n",
        "audio = whisper.pad_or_trim(audio)\n",
        "\n",
        "# make log-Mel spectrogram and move to the same device as the model\n",
        "mel = whisper.log_mel_spectrogram(audio).to(model.device)\n",
        "\n",
        "# detect the spoken language\n",
        "_, probs = model.detect_language(mel)\n",
        "audio_lang = max(probs, key=probs.get)\n",
        "print(f\"Detected language: {audio_lang}\")"
      ],
      "metadata": {
        "id": "AwzzjGFf7h-v",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "a3e36a0e-9e5d-4a0e-e953-d9c0826b7d27"
      },
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "100%|████████████████████████████████████████| 461M/461M [00:03<00:00, 137MiB/s]\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Detected language: es\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# NLTK helps to split the transcription sentence by sentence\n",
        "# and shows it in a neat manner one below another. You will see it in the output below.\n",
        "\n",
        "import nltk\n",
        "nltk.download('punkt')\n",
        "from nltk import sent_tokenize\n",
        "\n",
        "# decode the audio\n",
        "options = whisper.DecodingOptions(fp16=torch.cuda.is_available(), language=audio_lang, task='transcribe')\n",
        "result = whisper.decode(model, mel, options)\n",
        "\n",
        "# print the recognized text\n",
        "print(\"----\\nTranscription from audio:\")\n",
        "for sent in sent_tokenize(result.text):\n",
        "  print(sent)\n",
        "\n",
        "# decode the audio\n",
        "options = whisper.DecodingOptions(fp16=torch.cuda.is_available(), language=audio_lang, task='translate')\n",
        "result = whisper.decode(model, mel, options)\n",
        "\n",
        "# print the recognized text\n",
        "print(\"----\\nTranslation from audio:\")\n",
        "for sent in sent_tokenize(result.text):\n",
        "  print(sent)"
      ],
      "metadata": {
        "id": "LGaiMqnl6as7",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "39b9f9e6-b64b-4d5f-98d9-b2119fec60ad"
      },
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "[nltk_data] Downloading package punkt to /root/nltk_data...\n",
            "[nltk_data]   Unzipping tokenizers/punkt.zip.\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "----\n",
            "Transcription from audio:\n",
            "¿Quiere utilizar sus programas a toda velocidad?\n",
            "¿Necesita algo dentro de su Pfe que le dé más potencia y que esté preparado para el software del futuro?\n",
            "¿Necesita el microprocesador Intel 486 DX2?\n",
            "Y como es de Intel, usted sabe que es compatible con todo tipo de software.\n",
            "Intel 486 DX2, asegúrese de que su próximo Pfe lo lleva dentro.\n",
            "----\n",
            "Translation from audio:\n",
            "Do you want to use your programs at full speed?\n",
            "Do you need something inside your PC that gives you more power and that is prepared for the software of the future?\n",
            "Do you need the micro processor Intel 486DX2?\n",
            "And as it is Intel, you know that it is compatible with all kinds of software.\n",
            "Intel 486DX2, make sure that your next PC takes it inside.\n"
          ]
        }
      ]
    }
  ]
}